Development and Q&A

Gerko Vink

Methodology & Statistics @ Utrecht University

13 Jun 2025

Disclaimer

I owe a debt of gratitude to many people as the thoughts and code in these slides are the process of years-long development cycles and discussions with my team, friends, colleagues and peers. When someone has contributed to the content of the slides, I have credited their authorship.

Images are either directly linked, or generated with StableDiffusion or DALL-E. That said, there is no information in this presentation that exceeds legal use of copyright materials in academic settings, or that should not be part of the public domain.

Warning

You may use any and all content in this presentation - including my name - and submit it as input to generative AI tools, with the following exception:

  • You must ensure that the content is not used for further training of the model

Slide materials and source code

Materials

Recap

Gisteren hebben we deze onderwerpen behandeld:

  • Basisplots: histogrammen, scatterplots en boxplots
  • Geavanceerde plots met ggplot2
  • Aanpassen van grafieken voor publicatie
  • Exporteren van grafieken en resultaten

Today

Vandaag behandelen we de volgende onderwerpen:

  • Zelf R-packages maken
  • Samenvatting + Q&A

Samenvatting (met stiekem wat nieuws)

Simpele data containers

c(1, 2, 3, 4, 5) # een vector met waarden 1 tot en met 5
[1] 1 2 3 4 5
1:10 # een vector met waarden 1 tot en met 10
 [1]  1  2  3  4  5  6  7  8  9 10
letters[1:10] # de eerste 10 elementen uit het letters object
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
a <- c(1, 2, 3, 4, 5) # een vector assigned naar a
a[3] # element 3 uit a
[1] 3
a[1:3] # element 1, 2 en 3 uit a
[1] 1 2 3
a[5:3] # element 5, 4 en 3 uit a
[1] 5 4 3
c(a, "oeps") # alles is nu character
[1] "1"    "2"    "3"    "4"    "5"    "oeps"

Matrices en data frames

m <- matrix(1:9, nrow = 3) # een matrix met 3 rijen en 3 kolommen
m
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
m[2, 3] # element in rij 2, kolom 3
[1] 8
m[1:2, 2:3] # elementen in rij 1 en 2, kolom 2 en 3
     [,1] [,2]
[1,]    4    7
[2,]    5    8
df <- data.frame(a = 1:3, b = letters[1:3]) # data frame met 3 rijen, 2 kolommen
df[, 2] # tweede kolom van df
[1] "a" "b" "c"
df$b # tweede kolom van df, met de naam b --> vandaar dat we $ kunnen gebruiken
[1] "a" "b" "c"

Datasets inlezen

library(haven) # voor het inlezen van SPSS, Stata en SAS bestanden
library(magrittr) # voor pipes
library(dplyr) # voor data manipulatie
# inlezen van een Stata bestand
stata <- read_dta("files/03-poverty-analysis-data-2022-rt001-housing-plus.dta") 
stata %>% glimpse()
Rows: 2,502
Columns: 114
$ hhid          <dbl> 1102500401, 1102500501, 1102500502, 1102501202, 11074305…
$ domain2       <dbl+lbl> 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 2.0, 2.…
$ psu           <dbl> 10250, 10250, 10250, 10250, 10743, 10743, 10743, 10743, …
$ domain        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ gp_subdom     <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ district      <chr> "Paramaribo", "Paramaribo", "Paramaribo", "Paramaribo", …
$ fortnight     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ panel         <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ hhid16        <dbl> 6015041, 6015051, 6015052, 6015122, 6039051, 6039081, 60…
$ lat_cen       <dbl> 5.847621, 5.847621, 5.847621, 5.847621, 5.819147, 5.8191…
$ long_cen      <dbl> -55.17032, -55.17032, -55.17032, -55.17032, -55.21745, -…
$ result        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ end_date_n    <date> 2022-01-04, 2022-01-05, 2022-01-10, 2022-01-04, 2022-01…
$ Year_s        <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 20…
$ Month_s       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Day           <dbl> 4, 5, 10, 4, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 15, 9, 14,…
$ stratum       <dbl> 2, 2, 2, 2, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 6, 6, 6, 6,…
$ hhid_text     <chr> "01102500401", "01102500501", "01102500502", "0110250120…
$ HHsize        <dbl> 1, 3, 1, 4, 1, 2, 5, 1, 3, 1, 7, 4, 1, 1, 4, 2, 2, 2, 2,…
$ HHsize2       <dbl> 1, 2, 1, 4, 1, 2, 5, 1, 3, 1, 7, 4, 1, 1, 4, 2, 2, 2, 2,…
$ interv        <dbl> 75, 75, 75, 75, 92, 92, 92, 92, 92, 83, 77, 74, 99, 74, …
$ end_date      <chr> "04/01/22", "05/01/22", "10/01/22", "04/01/22", "05/01/2…
$ q17_02        <dbl+lbl> 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2…
$ q17_03a       <dbl+lbl>  1,  1,  1, NA,  1,  1,  1,  1,  1,  1,  1,  1,  1, …
$ q17_03b       <dbl+lbl>  2,  2,  2, NA,  2,  2,  2,  2,  2,  2,  2,  2,  2, …
$ q17_04        <dbl+lbl> NA, NA, NA,  1, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12a          <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ q12_01a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_01b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_02a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_02b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_03a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_03b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_04a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_04b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_05        <dbl+lbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q13_01        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ q13_01_ot     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ q13_02        <dbl+lbl> 2, 1, 1, 5, 2, 2, 3, 2, 2, 5, 8, 1, 3, 2, 2, 3, 8, 2…
$ q13_03        <dbl+lbl>  5,  5,  1, NA,  2,  2,  5,  5,  2, NA, NA,  5,  3, …
$ q13_04        <dbl> 1985, 2006, 2003, -1, 1982, 1980, 1982, 1992, 1980, 2005…
$ q13_05        <dbl+lbl> 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2…
$ q13_06        <chr> "DAKPLATEN VERWISSELEN EN KEUKEN BIJGEBOU", "", "", "", …
$ q13_07        <dbl> 2020, NA, NA, NA, 2020, 2017, 2019, NA, 2017, 2021, 2020…
$ q13_08        <dbl+lbl> 2, 5, 2, 5, 1, 1, 5, 2, 1, 5, 1, 5, 5, 5, 2, 5, 2, 5…
$ q13_09        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 1, 1, 1, 1, 1, 1…
$ q13_10        <dbl+lbl> 2, 5, 5, 1, 2, 1, 1, 1, 2, 3, 1, 5, 1, 5, 4, 1, 5, 5…
$ q13_11        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ q13_12a       <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2…
$ q13_12b       <dbl+lbl> 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ q13_12c       <dbl+lbl> 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2…
$ q13_12d       <dbl+lbl> 2, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ q13_12e       <dbl+lbl> 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2…
$ q13_12f       <dbl+lbl> 2, 1, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ q13_12g       <dbl+lbl> 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2…
$ q13_13        <dbl+lbl> 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
$ q13_14        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ q13_14_ot     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ q13_15        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 5…
$ q13_16        <dbl+lbl>  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, …
$ q13_17        <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ q13_18        <dbl> 2, 4, 2, 2, 4, 3, 4, 5, 3, 4, 4, 4, 4, 3, 5, 2, 4, 4, 2,…
$ q13_19        <dbl> 2, 4, 2, 2, 2, 3, 4, 2, 3, 2, 3, 4, 3, 2, 5, 2, 3, 3, 2,…
$ q13_20        <dbl> 1, 2, 2, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 1, 2, 2, 3, 2, 2,…
$ q13_21        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_22        <dbl> 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,…
$ q13_23a       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1,…
$ q13_23b       <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ q13_23c       <dbl> 1, 0, 1, 1, 2, 2, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 4, 1, 2,…
$ q13_23d       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_23e       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_23f       <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0,…
$ q13_23h       <dbl> 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_23i       <dbl> 0, 1, 1, 0, 1, 2, 2, 1, 2, 1, 2, 1, 0, 0, 2, 0, 2, 0, 1,…
$ q13_23j       <dbl> 1, 2, 3, 1, 1, 2, 5, 1, 2, 1, 5, 4, 1, 0, 4, 0, 2, 0, 1,…
$ q13_23k       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 1, 1,…
$ q13_23l       <dbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 0, 1, 0,…
$ q13_23m       <dbl> 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,…
$ q13_23n       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_24        <dbl+lbl> 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 3, 5, 3, 2, 1, 2, 2…
$ q19_01a       <dbl+lbl> 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1…
$ q19_01b       <dbl+lbl>  1,  2,  2,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2, …
$ q19_01c       <dbl+lbl>  1,  2,  2,  1,  1,  2,  2,  2,  2,  2,  1,  2,  2, …
$ q19_01d       <dbl+lbl> 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1…
$ q19_01e       <dbl+lbl> 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1…
$ q19_01f       <dbl+lbl>  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  1, …
$ q19_01g       <dbl+lbl> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1…
$ q19_01h       <dbl+lbl> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1…
$ q20_01_2001a  <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2…
$ q20_01_2001b  <dbl+lbl> 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2…
$ q20_01_2001c  <dbl+lbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q20_01_2001d  <dbl+lbl> 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1…
$ q20_01_2001e  <dbl+lbl> 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2…
$ q20_01_2001f  <dbl+lbl> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ q20_01_2001g  <dbl+lbl> 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ q20_01_2001h  <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ q20_01_2001i  <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ weight2       <dbl> 36.20733, 50.22768, 36.41680, 32.17578, 56.93408, 64.684…
$ weight3       <dbl> 36.20733, 150.68304, 36.41680, 128.70313, 56.93408, 129.…
$ quintile      <dbl+lbl> 5, 3, 5, 2, 5, 5, 4, 5, 4, 5, 4, 5, 4, 2, 4, 5, 5, 3…
$ decile        <dbl+lbl> 10,  6, 10,  4, 10, 10,  8, 10,  7, 10,  7,  9,  8, …
$ centile       <dbl> 100, 60, 98, 33, 98, 99, 75, 99, 66, 99, 65, 89, 78, 29,…
$ cons_pc       <dbl> 22965.266, 4823.205, 14460.016, 3115.352, 15724.735, 180…
$ food_pc       <dbl> 14684.6487, 1991.8493, 5201.2645, 1488.5779, 5734.7305, …
$ nonfood_pc    <dbl> 8280.6176, 2831.3561, 9258.7511, 1626.7739, 9990.0040, 1…
$ line_extreme  <dbl> 1011.0048, 1011.0048, 1011.0048, 1011.0048, 1011.0048, 1…
$ line_moderate <dbl> 2659.820, 2659.820, 2659.820, 2659.820, 2659.820, 2659.8…
$ poor_extreme  <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ poor_all      <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ poor          <dbl+lbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ Year          <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 20…
$ CPI_june2017  <dbl> 127.2, 127.2, 127.2, 127.2, 127.2, 127.2, 127.2, 127.2, …
$ CPI_june2022  <dbl> 460.825, 460.825, 460.825, 460.825, 460.825, 460.825, 46…
$ CPI_2017_22   <dbl> 3.622838, 3.622838, 3.622838, 3.622838, 3.622838, 3.6228…

Datasets inspecteren

stata %>% head(n = 5) # eerste 5 rijen van de dataset
# A tibble: 5 × 114
        hhid domain2     psu domain  gp_subdom district fortnight panel   hhid16
       <dbl> <dbl+lbl> <dbl> <dbl+l> <dbl+lbl> <chr>        <dbl> <dbl+l>  <dbl>
1 1102500401 1.1       10250 1 [Gre… 1 [Param… Paramar…         1 1 [Pan… 6.02e6
2 1102500501 1.1       10250 1 [Gre… 1 [Param… Paramar…         1 1 [Pan… 6.02e6
3 1102500502 1.1       10250 1 [Gre… 1 [Param… Paramar…         1 1 [Pan… 6.02e6
4 1102501202 1.1       10250 1 [Gre… 1 [Param… Paramar…         1 1 [Pan… 6.02e6
5 1107430501 1.1       10743 1 [Gre… 1 [Param… Paramar…         1 1 [Pan… 6.04e6
# ℹ 105 more variables: lat_cen <dbl>, long_cen <dbl>, result <dbl+lbl>,
#   end_date_n <date>, Year_s <dbl>, Month_s <dbl>, Day <dbl>, stratum <dbl>,
#   hhid_text <chr>, HHsize <dbl>, HHsize2 <dbl>, interv <dbl>, end_date <chr>,
#   q17_02 <dbl+lbl>, q17_03a <dbl+lbl>, q17_03b <dbl+lbl>, q17_04 <dbl+lbl>,
#   q12a <dbl+lbl>, q12_01a <dbl>, q12_01b <dbl>, q12_02a <dbl>, q12_02b <dbl>,
#   q12_03a <dbl>, q12_03b <dbl>, q12_04a <dbl>, q12_04b <dbl>,
#   q12_05 <dbl+lbl>, q13_01 <dbl+lbl>, q13_01_ot <chr>, q13_02 <dbl+lbl>, …
stata %>% slice_head(n = 2) # eerste 2 rijen van de dataset
# A tibble: 2 × 114
        hhid domain2     psu domain  gp_subdom district fortnight panel   hhid16
       <dbl> <dbl+lbl> <dbl> <dbl+l> <dbl+lbl> <chr>        <dbl> <dbl+l>  <dbl>
1 1102500401 1.1       10250 1 [Gre… 1 [Param… Paramar…         1 1 [Pan… 6.02e6
2 1102500501 1.1       10250 1 [Gre… 1 [Param… Paramar…         1 1 [Pan… 6.02e6
# ℹ 105 more variables: lat_cen <dbl>, long_cen <dbl>, result <dbl+lbl>,
#   end_date_n <date>, Year_s <dbl>, Month_s <dbl>, Day <dbl>, stratum <dbl>,
#   hhid_text <chr>, HHsize <dbl>, HHsize2 <dbl>, interv <dbl>, end_date <chr>,
#   q17_02 <dbl+lbl>, q17_03a <dbl+lbl>, q17_03b <dbl+lbl>, q17_04 <dbl+lbl>,
#   q12a <dbl+lbl>, q12_01a <dbl>, q12_01b <dbl>, q12_02a <dbl>, q12_02b <dbl>,
#   q12_03a <dbl>, q12_03b <dbl>, q12_04a <dbl>, q12_04b <dbl>,
#   q12_05 <dbl+lbl>, q13_01 <dbl+lbl>, q13_01_ot <chr>, q13_02 <dbl+lbl>, …

Datasets corrigeren

stata %>% 
  as_factor() %>% # alle labelled variabelen omzetten naar factor
  glimpse()
Rows: 2,502
Columns: 114
$ hhid          <dbl> 1102500401, 1102500501, 1102500502, 1102501202, 11074305…
$ domain2       <fct> 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, 1.1, Rest of the…
$ psu           <dbl> 10250, 10250, 10250, 10250, 10743, 10743, 10743, 10743, …
$ domain        <fct> Great Paramaribo, Great Paramaribo, Great Paramaribo, Gr…
$ gp_subdom     <fct> Paramaribo, Paramaribo, Paramaribo, Paramaribo, Paramari…
$ district      <chr> "Paramaribo", "Paramaribo", "Paramaribo", "Paramaribo", …
$ fortnight     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ panel         <fct> Panel, Panel, Panel, Panel, Panel, Panel, Panel, Panel, …
$ hhid16        <dbl> 6015041, 6015051, 6015052, 6015122, 6039051, 6039081, 60…
$ lat_cen       <dbl> 5.847621, 5.847621, 5.847621, 5.847621, 5.819147, 5.8191…
$ long_cen      <dbl> -55.17032, -55.17032, -55.17032, -55.17032, -55.21745, -…
$ result        <fct> Interview finalized - Fully completed, Interview finaliz…
$ end_date_n    <date> 2022-01-04, 2022-01-05, 2022-01-10, 2022-01-04, 2022-01…
$ Year_s        <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 20…
$ Month_s       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Day           <dbl> 4, 5, 10, 4, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 15, 9, 14,…
$ stratum       <dbl> 2, 2, 2, 2, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 6, 6, 6, 6,…
$ hhid_text     <chr> "01102500401", "01102500501", "01102500502", "0110250120…
$ HHsize        <dbl> 1, 3, 1, 4, 1, 2, 5, 1, 3, 1, 7, 4, 1, 1, 4, 2, 2, 2, 2,…
$ HHsize2       <dbl> 1, 2, 1, 4, 1, 2, 5, 1, 3, 1, 7, 4, 1, 1, 4, 2, 2, 2, 2,…
$ interv        <dbl> 75, 75, 75, 75, 92, 92, 92, 92, 92, 83, 77, 74, 99, 74, …
$ end_date      <chr> "04/01/22", "05/01/22", "10/01/22", "04/01/22", "05/01/2…
$ q17_02        <fct> Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ q17_03a       <fct> YES, YES, YES, NA, YES, YES, YES, YES, YES, YES, YES, YE…
$ q17_03b       <fct> NO, NO, NO, NA, NO, NO, NO, NO, NO, NO, NO, NO, NO, NA, …
$ q17_04        <fct> NA, NA, NA, "Device cost (cell phone, computer, tablet)"…
$ q12a          <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, No, …
$ q12_01a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_01b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_02a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_02b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_03a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_03b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_04a       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_04b       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q12_05        <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q13_01        <fct> House, House, House, House, House, House, House, House, …
$ q13_01_ot     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ q13_02        <fct> Owned (without mortgage), Owned (with mortgage), Owned (…
$ q13_03        <fct> Gov't rented/ Leased, Gov't rented/ Leased, Owned (with …
$ q13_04        <dbl> 1985, 2006, 2003, -1, 1982, 1980, 1982, 1992, 1980, 2005…
$ q13_05        <fct> Yes, No, No, No, Yes, Yes, Yes, No, Yes, Yes, Yes, No, N…
$ q13_06        <chr> "DAKPLATEN VERWISSELEN EN KEUKEN BIJGEBOU", "", "", "", …
$ q13_07        <dbl> 2020, NA, NA, NA, 2020, 2017, 2019, NA, 2017, 2021, 2020…
$ q13_08        <fct> Wood and building stones/bricks, Building stones, Wood a…
$ q13_09        <fct> Fully enclosed masonry, Fully enclosed masonry, Fully en…
$ q13_10        <fct> Wood, Tiles, Tiles, Masonry, Wood, Masonry, Masonry, Mas…
$ q13_11        <fct> "Metal roofing sheets/roof tiles (zinc, galva, aluminum)…
$ q13_12a       <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, No, …
$ q13_12b       <fct> No, No, No, No, Yes, Yes, Yes, No, Yes, No, No, No, No, …
$ q13_12c       <fct> Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes, No, No,…
$ q13_12d       <fct> No, Yes, Yes, Yes, No, Yes, No, Yes, Yes, No, No, No, No…
$ q13_12e       <fct> Yes, Yes, Yes, Yes, Yes, No, Yes, No, No, No, No, No, No…
$ q13_12f       <fct> No, Yes, Yes, No, Yes, Yes, No, Yes, No, No, No, No, No,…
$ q13_12g       <fct> No, Yes, No, No, No, No, Yes, Yes, No, Yes, No, No, No, …
$ q13_13        <fct> Gas (propane), Gas (propane), None (does not cook), Gas …
$ q13_14        <fct> WC with flushing (linked to septic tank), WC with flushi…
$ q13_14_ot     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ q13_15        <fct> "Piped water in dwelling", "Piped water in dwelling", "P…
$ q13_16        <fct> SWM, SWM, SWM, SWM, SWM, SWM, SWM, SWM, SWM, SWM, SWM, S…
$ q13_17        <fct> Electricity directly from EBS, Electricity directly from…
$ q13_18        <dbl> 2, 4, 2, 2, 4, 3, 4, 5, 3, 4, 4, 4, 4, 3, 5, 2, 4, 4, 2,…
$ q13_19        <dbl> 2, 4, 2, 2, 2, 3, 4, 2, 3, 2, 3, 4, 3, 2, 5, 2, 3, 3, 2,…
$ q13_20        <dbl> 1, 2, 2, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 1, 2, 2, 3, 2, 2,…
$ q13_21        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_22        <dbl> 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,…
$ q13_23a       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1,…
$ q13_23b       <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ q13_23c       <dbl> 1, 0, 1, 1, 2, 2, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 4, 1, 2,…
$ q13_23d       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_23e       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_23f       <dbl> 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0,…
$ q13_23h       <dbl> 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_23i       <dbl> 0, 1, 1, 0, 1, 2, 2, 1, 2, 1, 2, 1, 0, 0, 2, 0, 2, 0, 1,…
$ q13_23j       <dbl> 1, 2, 3, 1, 1, 2, 5, 1, 2, 1, 5, 4, 1, 0, 4, 0, 2, 0, 1,…
$ q13_23k       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 1, 1,…
$ q13_23l       <dbl> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 0, 1, 0,…
$ q13_23m       <dbl> 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,…
$ q13_23n       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ q13_24        <fct> Relatives, Relatives, Relatives, Relatives, This dweling…
$ q19_01a       <fct> Yes, Yes, No, Yes, Yes, No, No, No, No, No, Yes, No, No,…
$ q19_01b       <fct> Yes, No, No, Yes, Yes, No, No, No, No, No, No, No, No, N…
$ q19_01c       <fct> Yes, No, No, Yes, Yes, No, No, No, No, No, Yes, No, No, …
$ q19_01d       <fct> Yes, No, No, No, No, No, No, No, No, Yes, No, No, No, No…
$ q19_01e       <fct> Yes, Yes, No, Yes, No, No, No, No, No, No, Yes, No, No, …
$ q19_01f       <fct> Yes, No, No, No, No, No, No, No, No, No, No, No, Yes, No…
$ q19_01g       <fct> Yes, No, No, No, No, No, No, No, No, No, No, No, No, No,…
$ q19_01h       <fct> Yes, No, No, No, No, No, No, No, No, No, No, No, No, No,…
$ q20_01_2001a  <fct> No, No, No, No, No, No, No, No, No, Yes, No, No, Yes, No…
$ q20_01_2001b  <fct> Yes, No, No, Yes, No, No, No, No, No, Yes, No, No, No, N…
$ q20_01_2001c  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ q20_01_2001d  <fct> Yes, Yes, No, Yes, Yes, No, No, No, No, No, No, Yes, Yes…
$ q20_01_2001e  <fct> Yes, No, No, Yes, No, No, No, No, No, Yes, No, Yes, Yes,…
$ q20_01_2001f  <fct> Yes, No, No, No, No, No, No, No, No, No, No, No, No, No,…
$ q20_01_2001g  <fct> No, Yes, No, No, No, No, Yes, No, No, No, No, No, No, No…
$ q20_01_2001h  <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, No, …
$ q20_01_2001i  <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ weight2       <dbl> 36.20733, 50.22768, 36.41680, 32.17578, 56.93408, 64.684…
$ weight3       <dbl> 36.20733, 150.68304, 36.41680, 128.70313, 56.93408, 129.…
$ quintile      <fct> Q5, Q3, Q5, Q2, Q5, Q5, Q4, Q5, Q4, Q5, Q4, Q5, Q4, Q2, …
$ decile        <fct> D10, D, D10, D4, D10, D10, D8, D10, D7, D10, D7, D9, D8,…
$ centile       <dbl> 100, 60, 98, 33, 98, 99, 75, 99, 66, 99, 65, 89, 78, 29,…
$ cons_pc       <dbl> 22965.266, 4823.205, 14460.016, 3115.352, 15724.735, 180…
$ food_pc       <dbl> 14684.6487, 1991.8493, 5201.2645, 1488.5779, 5734.7305, …
$ nonfood_pc    <dbl> 8280.6176, 2831.3561, 9258.7511, 1626.7739, 9990.0040, 1…
$ line_extreme  <dbl> 1011.0048, 1011.0048, 1011.0048, 1011.0048, 1011.0048, 1…
$ line_moderate <dbl> 2659.820, 2659.820, 2659.820, 2659.820, 2659.820, 2659.8…
$ poor_extreme  <fct> Not extreme poor, Not extreme poor, Not extreme poor, No…
$ poor_all      <fct> Not poor, Not poor, Not poor, Not poor, Not poor, Not po…
$ poor          <fct> Non poor, Non poor, Non poor, Non poor, Non poor, Non po…
$ Year          <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 20…
$ CPI_june2017  <dbl> 127.2, 127.2, 127.2, 127.2, 127.2, 127.2, 127.2, 127.2, …
$ CPI_june2022  <dbl> 460.825, 460.825, 460.825, 460.825, 460.825, 460.825, 46…
$ CPI_2017_22   <dbl> 3.622838, 3.622838, 3.622838, 3.622838, 3.622838, 3.6228…

Let op dat deze dan wel goed gelabelled zijn!

Tabellen en kruistabellen

stata %$% # exposition pipe 
  table(HHsize) # frequentietabel van HHsize
HHsize
  1   2   3   4   5   6   7   8   9  10  11  12  16 
563 617 445 407 252 103  45  34  17  13   1   3   2 
stata %$% # exposition pipe
  table(HHsize, district) # kruistabel van HHsize en district
      district
HHsize Brokopondo Commewijne Coronie Marowijne Nickerie Para Paramaribo
    1           1         38      10        18       38   11        219
    2           2         46       6        16       55    4        235
    3           1         31       2        13       48    3        169
    4           2         23       8         7       37    3        169
    5           0         20       3         4       20    2         92
    6           0          5       3         7        7    0         32
    7           0          2       1         8        5    0         10
    8           1          2       0         1        1    2          9
    9           0          2       0         0        1    1          5
    10          0          1       0         1        1    0          4
    11          0          0       0         0        0    0          0
    12          0          0       0         0        0    1          0
    16          0          0       0         0        0    0          1
      district
HHsize Saramacca Sipaliwini Wanica
    1         51         54    123
    2         70         28    155
    3         38         14    126
    4         38          9    111
    5         23          3     85
    6          3          5     41
    7          4          1     14
    8          4          1     13
    9          2          0      6
    10         0          1      5
    11         0          0      1
    12         0          2      0
    16         0          0      1

Datasets manipuleren: filter OF select

stata %>% 
  filter(district == "Commewijne" | district == "Nickerie") %$% # filter op district
  table(district)
district
Commewijne   Nickerie 
       170        213 
stata %>% 
  select(district, HHsize) %>% # selecteer kolommen
  head()
# A tibble: 6 × 2
  district   HHsize
  <chr>       <dbl>
1 Paramaribo      1
2 Paramaribo      3
3 Paramaribo      1
4 Paramaribo      4
5 Paramaribo      1
6 Paramaribo      2

Datasets manipuleren: filter EN select

stata %>% 
  filter(district == "Commewijne" | district == "Nickerie") %>% # filter op district
  select(district, HHsize) %>% # selecteer kolommen
  head()
# A tibble: 6 × 2
  district   HHsize
  <chr>       <dbl>
1 Commewijne      2
2 Commewijne      2
3 Commewijne      2
4 Commewijne      2
5 Commewijne      1
6 Commewijne      5

Datasets manipuleren: mutate

stata %<>% # assign pipe ipv `stata <- stata %>%`
  as_factor() %>% # alle labelled variabelen omzetten naar factor
  mutate(HHclass = ifelse(test = HHsize > 5, # Als HHsize groter dan 5
                          yes = "groot", # dan is HHclass groot
                          no = "niet groot")) # anders is HHclass niet groot

stata %<>%
  mutate(HHclass2 = case_when(HHsize > 5 ~ "groot", 
                             HHsize <= 5 ~ "niet groot")) 

Datasets manipuleren: group_by en summarise

stata %>% 
  group_by(district) %>% # groepeer op district
  summarise(mean_HHsize = mean(HHsize, na.rm = TRUE), # gemiddelde HHsize per district
            median_HHsize = median(HHsize, na.rm = TRUE), # mediaan HHsize per district)
            sd_HHsize = sd(HHsize, na.rm = TRUE), # standaarddeviatie HHsize perm district
            IQR_HHsize = IQR(HHsize, na.rm = TRUE)) # interkwartielafstand HHsize per district
# A tibble: 10 × 5
   district   mean_HHsize median_HHsize sd_HHsize IQR_HHsize
   <chr>            <dbl>         <dbl>     <dbl>      <dbl>
 1 Brokopondo        3.43             3      2.30          2
 2 Commewijne        2.96             3      1.80          2
 3 Coronie           3.03             3      1.85          3
 4 Marowijne         3.37             3      2.22          3
 5 Nickerie          3.02             3      1.65          2
 6 Para              3.22             2      2.94          3
 7 Paramaribo        2.93             3      1.74          2
 8 Saramacca         2.87             2      1.69          2
 9 Sipaliwini        2.39             2      2.10          2
10 Wanica            3.32             3      1.97          2

T-toets

# var.test
stata %>% 
  filter(district == "Paramaribo" | district == "Wanica") %$% # filter op district
  var.test(HHsize ~ district) # t-toets voor gelijke varianties

    F test to compare two variances

data:  HHsize by district
F = 0.78023, num df = 944, denom df = 680, p-value = 0.0004469
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.6780808 0.8963271
sample estimates:
ratio of variances 
         0.7802292 
stata %>% 
  filter(district == "Paramaribo" | district == "Wanica") %$% # filter op district
  t.test(HHsize ~ district, var.equal = FALSE) # t-toets voor gelijke varianties

    Welch Two Sample t-test

data:  HHsize by district
t = -4.1222, df = 1351.8, p-value = 3.981e-05
alternative hypothesis: true difference in means between group Paramaribo and group Wanica is not equal to 0
95 percent confidence interval:
 -0.5743252 -0.2039514
sample estimates:
mean in group Paramaribo     mean in group Wanica 
                2.928042                 3.317181 

\(\chi^2\)-toets

stata %>% 
  filter(district == "Paramaribo" | district == "Wanica") %$% # filter op district
  table(HHclass, district)
            district
HHclass      Paramaribo Wanica
  groot              61     81
  niet groot        884    600
stata %>% 
  filter(district == "Paramaribo" | district == "Wanica") %$% # filter op district
  table(HHclass, district) %>% 
  chisq.test() # chi-kwadraattoets voor kruistabel van HHclass en district

    Pearson's Chi-squared test with Yates' continuity correction

data:  .
X-squared = 14.017, df = 1, p-value = 0.0001812

Expected and observed frequencies

X2 <- stata %>% 
  filter(district == "Paramaribo" | district == "Wanica") %$% # filter op district
  table(HHclass, district) %>% 
  chisq.test() # chi-kwadraattoets voor kruistabel van HHclass en district
X2$observed # waargenomen frequenties
            district
HHclass      Paramaribo Wanica
  groot              61     81
  niet groot        884    600
X2$expected # verwachte frequenties
            district
HHclass      Paramaribo    Wanica
  groot        82.52768  59.47232
  niet groot  862.47232 621.52768
stata %>% 
  filter(district == "Paramaribo" | district == "Wanica") %$% # filter op district
  table(HHclass, district) %>% 
  fisher.test()

    Fisher's Exact Test for Count Data

data:  .
p-value = 0.000169
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.3546500 0.7342262
sample estimates:
odds ratio 
  0.511352 

Missende waarden

library(mice)
library(ggmice)
imp <- mice(mice::nhanes2, print = FALSE)
imp$data %>% 
  plot_pattern()

Analyzing imputations

library(purrr)
imp %>% 
  complete("all") %>% # list met alle geimputeerde data sets
  map(~.x %$% mean(bmi))
$`1`
[1] 27.448

$`2`
[1] 26.492

$`3`
[1] 26.364

$`4`
[1] 26.82

$`5`
[1] 27.388
imp %>% 
  complete("all") %>% # list met alle geimputeerde data sets
  map_df(~.x %$% mean(bmi))
# A tibble: 1 × 5
    `1`   `2`   `3`   `4`   `5`
  <dbl> <dbl> <dbl> <dbl> <dbl>
1  27.4  26.5  26.4  26.8  27.4

Analyzing imputations: averaging

library(purrr)
imp %>% 
  complete("all") %>% # list met alle geimputeerde data sets
  map(~.x %$% mean(bmi)) %>% 
  reduce(`+`) / imp$m
[1] 26.9024
imp %>% 
  complete("all") %>% # list met alle geimputeerde data sets
  map_df(~.x %$% mean(bmi)) %>% 
  rowMeans()
[1] 26.9024

Analyzing imputations: pooling

imp %>% 
  complete("all") %>% # list met alle geimputeerde data sets
  map(~.x %$% lm(bmi ~ chl)) %>% 
  pool() %>%  # pooling van de resultaten
  summary() # samenvatting van de resultaten
         term    estimate  std.error statistic       df      p.value
1 (Intercept) 21.14641280 4.45946158  4.741921 16.60657 0.0002005259
2         chl  0.02958428 0.02237211  1.322374 16.71592 0.2038528932

Plotting

library(ggplot2)
stata %>% 
  ggplot(aes(x = HHclass, y = HHsize, fill = HHclass)) + # plot HHsize per district
  geom_bar(stat = "summary", fun = "mean") # staafdiagram met gemiddelde HHsize per district

Plotting: histogram

stata %>% 
  ggplot(aes(x = HHsize)) + # plot HHsize
  geom_histogram(binwidth = 1, fill = "blue", color = "black") + # histogram met binwidth van 1
  labs(title = "Histogram van HHsize", x = "HHsize", y = "Frequentie") + # titel en labels
  theme_minimal() # minimalistisch thema

Development

Browser()

myfunction <- function(x) {
  # Deze functie doet iets met x
  x_squared <- x^2 # kwadrateert x
  browser() # zet een breakpoint in de code
  return(x_squared) # geeft het kwadraat van x terug
}

reprex()

Demo

Live R-package maken